Bootstrapping Regression Models Appendix to An R and S-PLUS Companion to Applied Regression
نویسنده
چکیده
Bootstrapping is a general approach to statistical inference based on building a sampling distribution for a statistic by resampling from the data at hand. The term ‘bootstrapping,’ due to Efron (1979), is an allusion to the expression ‘pulling oneself up by one’s bootstraps’ – in this case, using the sample data as a population from which repeated samples are drawn. At first blush, the approach seems circular, but has been shown to be sound. Two S libraries for bootstrapping are associated with extensive treatments of the subject: Efron and Tibshirani’s (1993) bootstrap library, and Davison and Hinkley’s (1997) boot library. Of the two, boot, programmed by A. J. Canty, is somewhat more capable, and will be used for the examples in this appendix. There are several forms of the bootstrap, and, additionally, several other resampling methods that are related to it, such as jackknifing, cross-validation, randomization tests, and permutation tests. I will stress the nonparametric bootstrap. Suppose that we draw a sample S = {X1, X2, ..., Xn} from a population P = {x1, x2, ..., xN}; imagine further, at least for the time being, that N is very much larger than n, and that S is either a simple random sample or an independent random sample from P; I will briefly consider other sampling schemes at the end of the appendix. It will also help initially to think of the elements of the population (and, hence, of the sample) as scalar values, but they could just as easily be vectors (i.e., multivariate). Now suppose that we are interested in some statistic T = t(S) as an estimate of the corresponding population parameter θ = t(P). Again, θ could be a vector of parameters and T the corresponding vector of estimates, but for simplicity assume that θ is a scalar. A traditional approach to statistical inference is to make assumptions about the structure of the population (e.g., an assumption of normality), and, along with the stipulation of random sampling, to use these assumptions to derive the sampling distribution of T , on which classical inference is based. In certain instances, the exact distribution of T may be intractable, and so we instead derive its asymptotic distribution. This familiar approach has two potentially important deficiencies:
منابع مشابه
Time-Series Regression and Generalized Least Squares Appendix to An R and S-PLUS Companion to Applied Regression
with covariance matrix V (bOLS) = σ (XX) Let us, however, assume more generally that ε ∼ Nn(0,Σ), where the error-covariance matrix Σ is symmetric and positive-definite. Different diagonal entries in Σ correspond to non-constant error variances, while nonzero off-diagonal entries correspond to correlated errors. Suppose, for the time-being, that Σ is known. Then, the log-likelihood for the mode...
متن کاملNonparametric Regression in R An Appendix to An R Companion to Applied Regression, Second Edition
In traditional parametric regression models, the functional form of the model is specified before the model is fit to data, and the object is to estimate the parameters of the model. In nonparametric regression, in contrast, the object is to estimate the regression function directly without specifying its form explicitly. In this appendix to Fox and Weisberg (2011), we describe how to fit sever...
متن کاملThe Comparison of Credit Risk between Artificial Neural Network and Logistic Regression Models in Tose-Taavon Bank in Guilan
One of the most important issues always facing banks and financial institutes is the issue of credit risk or the possibility of failure in the fulfillment of obligations by applicants who are receiving credit facilities. The considerable number of banks’ delayed loan payments all around the world shows the importance of this issue and the necessary consideration of this topic. Accordingly...
متن کاملAn Integrated DEA and Data Mining Approach for Performance Assessment
This paper presents a data envelopment analysis (DEA) model combined with Bootstrapping to assess performance of one of the Data mining Algorithms. We applied a two-step process for performance productivity analysis of insurance branches within a case study. First, using a DEA model, the study analyzes the productivity of eighteen decision-making units (DMUs). Using a Malmquist index, DEA deter...
متن کاملExperimental investigation, modeling, and optimization of combined electro-(fenton/coagulation/flotation) process: design of experiments and artificial intelligence systems
In this study, a combined electro-(Fenton/coagulation/flotation) (EF/EC/El) process was studied via degradation of Disperse Orange 25 (DO25) organic dye as a case study. Influences of seven operational parameters on the dye removal efficiency (DR%) were measured: initial pH of the solution (pH0), applied voltage between the anode and cathode (V), initial ferrous ion concentration (CFe), initial...
متن کامل